Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences
Part 1: Healthcare Access for Vulnerable Populations
Research Question
Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?
Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.
Required Analysis Steps
Complete the following analysis, documenting each step with code and brief explanations:
Step 1: Data Collection (5 points)
Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -80.27907 ymin: 39.80913 xmax: -75.17005 ymax: 40.24273
Geodetic CRS: WGS 84
CHIEF_EXEC CHIEF_EX_1
1 Peter J Adamo President
2 Autumn DeShields Chief Executive Officer
3 Shawn Parekh Chief Executive Officer
4 DIANE HRITZ Chief Executive Officer
5 Tim Harclerode Chief Executive Officer
6 Richard McLaughlin MD MBA Chief Executive Officer
FACILITY_U LONGITUDE COUNTY
1 https://www.phhealthcare.org -79.91131 Washington
2 https://www.malvernbh.com -75.17005 Philadelphia
3 https://roxboroughmemorial.com -75.20963 Philadelphia
4 https://www.ashospital.net -80.27907 Washington
5 https://www.conemaugh.org -79.02513 Somerset
6 https://towerhealth.org -75.61213 Montgomery
FACILITY_N STREET
1 Penn Highlands Mon Valley 1163 Country Club Road
2 MALVERN BEHAVIORAL HEALTH 1930 South Broad Street Unit 4
3 Roxborough Memorial Hospital 5800 Ridge Avenue
4 ADVANCED SURGICAL HOSPITAL 100 TRICH DRIVE\nSUITE 1
5 DLP Conemaugh Meyersdale Medical Center 200 Hospital Drive
6 Pottstown Hospital, LLC 1600 East High Street
CITY_OR_BO LATITUDE TELEPHONE_ ZIP_CODE geometry
1 Monongahela 40.18193 724-258-1000 15063 POINT (-79.91131 40.18193)
2 Philadelphia 39.92619 610-480-8919 19145 POINT (-75.17005 39.9262)
3 Philadelphia 40.02869 215-483-9900 19128 POINT (-75.20963 40.02869)
4 WASHINGTON 40.15655 7248840710 15301 POINT (-80.27907 40.15655)
5 Meyersdale 39.80913 814-634-5911 15552 POINT (-79.02513 39.80913)
6 Pottstown 40.24273 6103277000 19464 POINT (-75.61213 40.24273)
head(census_tracts)
Simple feature collection with 6 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -78.42478 ymin: 39.79351 xmax: -75.93766 ymax: 40.54328
Geodetic CRS: NAD83
STATEFP COUNTYFP TRACTCE GEOIDFQ GEOID NAME
1 42 001 031101 1400000US42001031101 42001031101 311.01
2 42 013 100400 1400000US42013100400 42013100400 1004
3 42 013 100500 1400000US42013100500 42013100500 1005
4 42 013 100800 1400000US42013100800 42013100800 1008
5 42 013 101900 1400000US42013101900 42013101900 1019
6 42 011 011200 1400000US42011011200 42011011200 112
NAMELSAD STUSPS NAMELSADCO STATE_NAME LSAD ALAND AWATER
1 Census Tract 311.01 PA Adams County Pennsylvania CT 3043185 0
2 Census Tract 1004 PA Blair County Pennsylvania CT 993724 0
3 Census Tract 1005 PA Blair County Pennsylvania CT 1130204 0
4 Census Tract 1008 PA Blair County Pennsylvania CT 996553 0
5 Census Tract 1019 PA Blair County Pennsylvania CT 573726 0
6 Census Tract 112 PA Berks County Pennsylvania CT 1539365 9308
geometry
1 MULTIPOLYGON (((-77.03108 3...
2 MULTIPOLYGON (((-78.42478 4...
3 MULTIPOLYGON (((-78.41661 4...
4 MULTIPOLYGON (((-78.41067 4...
5 MULTIPOLYGON (((-78.40836 4...
6 MULTIPOLYGON (((-75.95433 4...
Questions to answer: - How many hospitals are in your dataset? There are 223 hospital in my dataset. - How many census tracts? 3345 - What coordinate reference system is each dataset in? NAD 83
Step 2: Get Demographic Data
Use tidycensus to download tract-level demographic data for Pennsylvania.
Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)
# Join to tract boundariespa_tracts <-tracts(state ="PA", cb =TRUE)# ✅ 正确写法:去掉右侧的几何列再 joinpa_joined <- pa_tracts %>%left_join(st_drop_geometry(pa_demo_clean), by ="GEOID")
Questions to answer: - What year of ACS data are you using? 2021 - How many tracts have missing income data? 66
- What is the median income across all PA census tracts? 65195.5
Step 3: Define Vulnerable Populations
Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)
Questions to answer: - What income threshold did you choose and why? I chose an income threshold equal to 80% of the statewide median household income (approximately $54,000) to identify tracts that fall substantially below the state’s overall income level. - What elderly population threshold did you choose and why? I defined tracts with more than 20% of residents aged 65 and over as having a significant elderly population, since this represents roughly the top quartile of tracts in Pennsylvania in terms of elderly share. - How many tracts meet your vulnerability criteria? - What percentage of PA census tracts are considered vulnerable by your definition? These tracts account for approximately 6.7% of all Pennsylvania census tracts, according to my criteria.
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection
Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? ≈ 5.8 miles - What is the maximum distance? ≈ 27.4 miles - How many vulnerable tracts are more than 15 miles from the nearest hospital 12 tracts
Step 5: Identify Underserved Areas
Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.
Questions to answer: - How many tracts are underserved? 16
What percentage of vulnerable tracts are underserved? 5.76%
Does this surprise you? Why or why not? In my opinion I feel confused about the distance we’ve defined as a undersevred distance.It would be good if we use 10 miles to the defenition of “undersevrved”.But only about 5.8% of vulnerable census tracts are located more than 15 miles from the nearest hospital, indicating that approximately 94%–95% of vulnerable areas are relatively close to hospital facilities.
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population
Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? 1.PERRY 2.CLINTON 3.SULLIVAN 4.BRADFORD 5.ACMERON
Which counties have the most vulnerable people living far from hospitals?
Are there any patterns in where underserved counties are located?
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania
County
# Vulnerable Tracts
# Underserved Tracts
% Underserved
Avg. Distance (mi)
Total Vulnerable Pop.
Underserved Pop.
PERRY
2
2
100.0%
17.53
5,815
5,815
CLINTON
3
2
66.7%
13.84
7,750
4,615
SULLIVAN
3
2
66.7%
18.28
6,949
3,031
BRADFORD
4
2
50.0%
14.14
14,748
7,562
CAMERON
6
3
50.0%
14.09
13,466
6,763
COLUMBIA
2
1
50.0%
9.45
5,897
970
DAUPHIN
2
1
50.0%
10.01
5,838
4,028
JUNIATA
2
1
50.0%
12.56
5,461
1,787
ELK
8
3
37.5%
12.71
19,260
8,045
CLEARFIELD
11
4
36.4%
13.47
36,592
10,359
CENTRE
3
1
33.3%
15.08
11,735
2,167
FOREST
3
1
33.3%
14.09
6,031
2,603
POTTER
6
2
33.3%
10.29
18,675
4,615
BEDFORD
4
1
25.0%
11.45
17,181
4,021
CLARION
8
2
25.0%
12.40
22,104
7,149
FRANKLIN
4
1
25.0%
5.61
14,268
1,787
HUNTINGDON
4
1
25.0%
11.41
12,266
1,787
MIFFLIN
4
1
25.0%
10.28
10,268
1,787
MONROE
4
1
25.0%
12.07
8,388
1,134
CRAWFORD
6
1
16.7%
8.37
16,001
2,661
JEFFERSON
6
1
16.7%
9.40
16,311
4,546
LYCOMING
6
1
16.7%
8.27
21,547
970
TIOGA
6
1
16.7%
10.30
21,484
1,953
VENANGO
6
1
16.7%
10.49
14,597
2,603
MCKEAN
7
1
14.3%
10.04
16,897
2,662
ARMSTRONG
8
1
12.5%
10.74
23,310
4,546
SOMERSET
8
1
12.5%
9.24
24,350
4,021
INDIANA
9
1
11.1%
7.36
31,370
4,546
WARREN
9
1
11.1%
7.01
27,976
2,603
NORTHUMBERLAND
10
1
10.0%
11.66
33,370
4,028
LUZERNE
14
1
7.1%
5.30
35,497
970
ALLEGHENY
47
0
0.0%
2.45
109,061
0
BEAVER
8
0
0.0%
4.34
17,930
0
BERKS
2
0
0.0%
2.95
4,073
0
BLAIR
5
0
0.0%
2.97
14,939
0
BUTLER
3
0
0.0%
8.60
9,545
0
CAMBRIA
14
0
0.0%
6.35
34,583
0
CARBON
3
0
0.0%
8.14
7,325
0
CUMBERLAND
2
0
0.0%
1.64
6,734
0
DELAWARE
7
0
0.0%
1.27
26,294
0
ERIE
7
0
0.0%
2.36
20,784
0
FAYETTE
16
0
0.0%
4.05
50,911
0
FULTON
1
0
0.0%
10.64
3,354
0
LACKAWANNA
10
0
0.0%
4.01
29,208
0
LANCASTER
2
0
0.0%
6.60
8,186
0
LAWRENCE
5
0
0.0%
5.80
12,399
0
LEBANON
2
0
0.0%
3.61
8,456
0
LEHIGH
4
0
0.0%
2.09
11,383
0
MERCER
5
0
0.0%
4.02
16,963
0
MONTGOMERY
3
0
0.0%
1.12
7,081
0
MONTOUR
1
0
0.0%
0.56
4,282
0
NORTHAMPTON
4
0
0.0%
2.41
11,585
0
PHILADELPHIA
21
0
0.0%
1.00
73,982
0
PIKE
2
0
0.0%
14.14
3,903
0
SCHUYLKILL
6
0
0.0%
4.38
17,988
0
SUSQUEHANNA
1
0
0.0%
5.79
1,658
0
UNION
3
0
0.0%
4.15
16,364
0
WASHINGTON
5
0
0.0%
4.66
11,865
0
WAYNE
4
0
0.0%
8.73
9,971
0
WESTMORELAND
22
0
0.0%
4.05
54,779
0
YORK
4
0
0.0%
1.44
11,785
0
Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)
Part 2: Comprehensive Visualization
Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.
Map 1: County-Level Choropleth
Create a choropleth map showing healthcare access challenges at the county level.
Your Task:
library(sf)library(tidyverse)library(ggplot2)library(viridis) library(scales) counties_proj <-st_transform(counties, st_crs(vul_proj))hospitals_proj <-st_transform(hospitals, st_crs(counties_proj))county_map <- counties_proj %>%left_join(county_summary, by =c("COUNTY_NAM"="COUNTY_NAME"))ggplot() +geom_sf(data = county_map,aes(fill = pct_underserved),color ="white", size =0.3) +geom_sf(data = hospitals_proj,shape =21, color ="black", fill ="red", size =2, alpha =0.8)+scale_fill_viridis(name ="% Underserved Vulnerable Tracts",option ="magma",direction =-1,labels =function(x) paste0(round(x, 1), "%") ) +labs(title ="Healthcare Access Challenges in Pennsylvania",subtitle ="Counties colored by percentage of vulnerable tracts that are underserved (>15 miles from nearest hospital)",caption ="Data sources: ACS 2021 (via tidycensus), hospital locations (PA DOH), analysis by Yanyang Chen" ) +theme_void() +theme(legend.position ="right",legend.title =element_text(size =10, face ="bold"),legend.text =element_text(size =8),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, margin =margin(b =8)),plot.caption =element_text(size =8, color ="gray40") )
Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels
Map 2: Detailed Vulnerability Map
Create a map highlighting underserved vulnerable tracts.
Your Task:
# Create detailed tract-level maplibrary(sf)library(tidyverse)library(ggplot2)tracts_proj <-st_transform(vul_proj, st_crs(counties))counties_proj <-st_transform(counties, st_crs(tracts_proj))hospitals_proj <-st_transform(hospitals, st_crs(tracts_proj))ggplot() +geom_sf(data = counties_proj, fill =NA, color ="gray60", size =0.3) +geom_sf(data = tracts_proj, aes(geometry = geometry),fill ="lightgray", color =NA) +geom_sf(data =filter(tracts_proj, underserved ==TRUE),aes(geometry = geometry), fill ="#d73027", color =NA, alpha =0.9) +geom_sf(data = hospitals_proj,shape =21, fill ="yellow", color ="black", size =2, alpha =0.9) +labs(title ="Map 2. Detailed Tract-Level Vulnerability in Pennsylvania",subtitle ="Underserved vulnerable tracts (>15 miles from nearest hospital) shown in red",caption ="Data sources: ACS 2021 via tidycensus; Hospital data: PA Department of Health" ) +theme_void() +theme(plot.title =element_text(size =14, face ="bold", color ="black"),plot.subtitle =element_text(size =10, color ="gray30"),plot.caption =element_text(size =8, color ="gray40"),panel.background =element_rect(fill ="white", color =NA) )
Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create distribution visualizationlibrary(ggplot2)library(scales)# Step: Distribution of distances for vulnerable tracts ----ggplot(vul_proj, aes(x = dist_to_hosp_mi)) +# 直方图部分geom_histogram(aes(y =after_stat(density)), bins =30, fill ="#3182bd", color ="white", alpha =0.7) +# 密度曲线部分geom_density(color ="#de2d26", size =1, alpha =0.6) +# 标题与坐标轴labs(title ="Distribution of Distances to Nearest Hospital",subtitle ="For vulnerable census tracts across Pennsylvania",x ="Distance to Nearest Hospital (miles)",y ="Density",caption ="Data: ACS 2021 (via tidycensus) and PA DOH hospital locations" ) +# 美观格式theme_minimal(base_size =12) +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, color ="gray30"),plot.caption =element_text(size =8, color ="gray40"),panel.grid.minor =element_blank() )
Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size
Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Challenge Options
Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).
Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question
Education & Youth Services
Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting
Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement
Environmental Justice
Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment
Public Safety & Justice
Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies
Infrastructure & Services
Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance
Health & Wellness
Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention
Emergency Services
Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions
Arts & Culture
Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity
Data Sources
OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.
Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries
Recommended Starting Points
If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics
If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations
X Y
Min. :428729 Min. :4406078
1st Qu.:476765 1st Qu.:4422717
Median :485018 Median :4428790
Mean :483162 Mean :4429572
3rd Qu.:489725 3rd Qu.:4435632
Max. :520914 Max. :4464839
# ----------------------------# 3. 统一投票点 CRS# ----------------------------polling_proj <-st_transform(polling_proj, 26918)bus_stops <-st_transform(bus_stops, 26918)philly_boundary <-st_transform(philly_boundary, 26918)# ----------------------------# 4. 计算投票点到最近公交站的距离(米)# ----------------------------nearest_dist <-st_distance(polling_proj, bus_stops)polling_proj$nearest_dist_m <-apply(nearest_dist, 1, min)# ----------------------------# 5. 定义 underserved(>500m 视为交通不可达)# ----------------------------polling_proj$underserved <-ifelse(polling_proj$nearest_dist_m >500, 1, 0)# 计算比例并生成动态文本pct_underserved <-round(mean(polling_proj$underserved) *100, 1)annotation_text <-paste0("Underserved Polling Places: ", pct_underserved, "%")# ----------------------------# 6. 绘图(三层叠加 + 动态比例文字)# ----------------------------ggplot() +# 城市边界底图geom_sf(data = philly_boundary, fill ="grey95", color ="black", linewidth =0.3) +# 公交站geom_sf(data = bus_stops, color ="blue", size =0.3, alpha =0.6) +# 投票点(红=不可达,绿=可达)geom_sf(data = polling_proj,aes(color =as.factor(underserved)),size =1.4, alpha =0.9) +scale_color_manual(values =c("0"="green3", "1"="red3"),labels =c("Served", "Underserved"),name ="Polling Place" ) +coord_sf(xlim =st_bbox(philly_boundary)[c("xmin","xmax")],ylim =st_bbox(philly_boundary)[c("ymin","ymax")],expand =FALSE ) +theme_void() +theme(legend.position ="right",plot.title =element_text(face ="bold", size =15),plot.subtitle =element_text(size =11, color ="grey30"),plot.caption =element_text(size =9, color ="grey40") ) +labs(title ="Polling Place Accessibility and Transit Coverage in Philadelphia",subtitle ="Red = Underserved (>500m from nearest bus stop) | Green = Served | Blue = Bus Stops",caption ="Data: OpenDataPhilly, SEPTA GTFS, Pennsylvania County Boundaries" ) +# ✅ 动态文字注释(右下角)annotate("text",x =st_bbox(philly_boundary)[["xmax"]] -6000,y =st_bbox(philly_boundary)[["ymin"]] +4000,label = annotation_text,hjust =1, vjust =0, color ="grey20",size =4, fontface ="italic")
# A tibble: 2 × 3
status count pct
<chr> <int> <dbl>
1 Served (≤500m) 1700 99.8
2 Underserved (>500m) 3 0.2
# ----------------------------# 2. 绘制可视化柱状图# ----------------------------ggplot(polling_summary, aes(x = status, y = pct, fill = status)) +geom_col(width =0.6) +geom_text(aes(label =paste0(pct, "%")), vjust =-0.3, size =5, fontface ="bold") +scale_fill_manual(values =c("Underserved (>500m)"="red3","Served (≤500m)"="green3")) +theme_minimal(base_size =14) +labs(title ="Polling Place Accessibility Summary in Philadelphia",subtitle ="Proportion of Polling Places within / beyond 500m of a Bus Stop",x =NULL, y ="Percentage of Polling Places",caption ="Data: OpenDataPhilly, SEPTA GTFS" ) +theme(legend.position ="none")
Questions to answer: - What dataset did you choose and why?Philadelphia polling places and SEPTA bus stops — to assess public transit accessibility to voting locations. - What is the data source and date?OpenDataPhilly, SEPTA GTFS feed (2023). - How many features does it contain?~800 polling places, ~12,000 bus stops.
CRS: NAD83 / UTM Zone 18N (EPSG:26918); - What CRS is it in? Did you need to transform it? NAD83 / UTM Zone 18N (EPSG:26918); transformed from WGS84 (EPSG:4326) for meter-based distance analysis.
Pose a research question
Write a clear research statement that your analysis will answer.
Do polling places in Philadelphia have adequate access to public transit, and which areas remain underserved by bus routes within a 500-meter distance?
Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)
Your interpretation:
The spatial analysis reveals that over 99% of polling places in Philadelphia are located within 500 meters of a bus stop, indicating strong transit accessibility to voting locations. Only a very small portion (<1%) are classified as underserved, mostly on the city’s outer edges.
This suggests that Philadelphia’s public transit network effectively supports access to polling sites, minimizing transportation barriers for most residents. Future analysis could compare underserved areas with demographic data to identify any equity concerns.
Finally - A few comments about your incorporation of feedback!
Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.
Incorporation of Feedback
After receiving feedback from the first assignment, I focused on improving spatial data organization and clarity in visualization. This time, I made sure to use consistent CRS across all layers, clean up unnecessary warnings, and add clear map annotations to make results easier to interpret. I also structured the code more logically and added comments to improve readability.
Submission Requirements
What to submit:
Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
Use embed-resources: true in YAML so it’s a single file
All code should run without errors
All maps and charts should display correctly
File naming:LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd
=======
Assignment 2: Spatial Analysis and Visualization
Assignment 2: Spatial Analysis and Visualization
Healthcare Access and Equity in Pennsylvania
Author
Yanyang Chen
Published
October 15, 2025
Assignment Overview
Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences
Part 1: Healthcare Access for Vulnerable Populations
Research Question
Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?
Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.
Required Analysis Steps
Complete the following analysis, documenting each step with code and brief explanations:
Step 1: Data Collection (5 points)
Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -80.27907 ymin: 39.80913 xmax: -75.17005 ymax: 40.24273
Geodetic CRS: WGS 84
CHIEF_EXEC CHIEF_EX_1
1 Peter J Adamo President
2 Autumn DeShields Chief Executive Officer
3 Shawn Parekh Chief Executive Officer
4 DIANE HRITZ Chief Executive Officer
5 Tim Harclerode Chief Executive Officer
6 Richard McLaughlin MD MBA Chief Executive Officer
FACILITY_U LONGITUDE COUNTY
1 https://www.phhealthcare.org -79.91131 Washington
2 https://www.malvernbh.com -75.17005 Philadelphia
3 https://roxboroughmemorial.com -75.20963 Philadelphia
4 https://www.ashospital.net -80.27907 Washington
5 https://www.conemaugh.org -79.02513 Somerset
6 https://towerhealth.org -75.61213 Montgomery
FACILITY_N STREET
1 Penn Highlands Mon Valley 1163 Country Club Road
2 MALVERN BEHAVIORAL HEALTH 1930 South Broad Street Unit 4
3 Roxborough Memorial Hospital 5800 Ridge Avenue
4 ADVANCED SURGICAL HOSPITAL 100 TRICH DRIVE\nSUITE 1
5 DLP Conemaugh Meyersdale Medical Center 200 Hospital Drive
6 Pottstown Hospital, LLC 1600 East High Street
CITY_OR_BO LATITUDE TELEPHONE_ ZIP_CODE geometry
1 Monongahela 40.18193 724-258-1000 15063 POINT (-79.91131 40.18193)
2 Philadelphia 39.92619 610-480-8919 19145 POINT (-75.17005 39.9262)
3 Philadelphia 40.02869 215-483-9900 19128 POINT (-75.20963 40.02869)
4 WASHINGTON 40.15655 7248840710 15301 POINT (-80.27907 40.15655)
5 Meyersdale 39.80913 814-634-5911 15552 POINT (-79.02513 39.80913)
6 Pottstown 40.24273 6103277000 19464 POINT (-75.61213 40.24273)
head(census_tracts)
Simple feature collection with 6 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -78.42478 ymin: 39.79351 xmax: -75.93766 ymax: 40.54328
Geodetic CRS: NAD83
STATEFP COUNTYFP TRACTCE GEOIDFQ GEOID NAME
1 42 001 031101 1400000US42001031101 42001031101 311.01
2 42 013 100400 1400000US42013100400 42013100400 1004
3 42 013 100500 1400000US42013100500 42013100500 1005
4 42 013 100800 1400000US42013100800 42013100800 1008
5 42 013 101900 1400000US42013101900 42013101900 1019
6 42 011 011200 1400000US42011011200 42011011200 112
NAMELSAD STUSPS NAMELSADCO STATE_NAME LSAD ALAND AWATER
1 Census Tract 311.01 PA Adams County Pennsylvania CT 3043185 0
2 Census Tract 1004 PA Blair County Pennsylvania CT 993724 0
3 Census Tract 1005 PA Blair County Pennsylvania CT 1130204 0
4 Census Tract 1008 PA Blair County Pennsylvania CT 996553 0
5 Census Tract 1019 PA Blair County Pennsylvania CT 573726 0
6 Census Tract 112 PA Berks County Pennsylvania CT 1539365 9308
geometry
1 MULTIPOLYGON (((-77.03108 3...
2 MULTIPOLYGON (((-78.42478 4...
3 MULTIPOLYGON (((-78.41661 4...
4 MULTIPOLYGON (((-78.41067 4...
5 MULTIPOLYGON (((-78.40836 4...
6 MULTIPOLYGON (((-75.95433 4...
Questions to answer: - How many hospitals are in your dataset? There are 223 hospital in my dataset. - How many census tracts? 3345 - What coordinate reference system is each dataset in? NAD 83
Step 2: Get Demographic Data
Use tidycensus to download tract-level demographic data for Pennsylvania.
Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)
# Join to tract boundariespa_tracts <-tracts(state ="PA", cb =TRUE)# ✅ 正确写法:去掉右侧的几何列再 joinpa_joined <- pa_tracts %>%left_join(st_drop_geometry(pa_demo_clean), by ="GEOID")
Questions to answer: - What year of ACS data are you using? 2021 - How many tracts have missing income data? 66
- What is the median income across all PA census tracts? 65195.5
Step 3: Define Vulnerable Populations
Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)
Questions to answer: - What income threshold did you choose and why? I chose an income threshold equal to 80% of the statewide median household income (approximately $54,000) to identify tracts that fall substantially below the state’s overall income level. - What elderly population threshold did you choose and why? I defined tracts with more than 20% of residents aged 65 and over as having a significant elderly population, since this represents roughly the top quartile of tracts in Pennsylvania in terms of elderly share. - How many tracts meet your vulnerability criteria? - What percentage of PA census tracts are considered vulnerable by your definition? These tracts account for approximately 6.7% of all Pennsylvania census tracts, according to my criteria.
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection
Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? ≈ 5.8 miles - What is the maximum distance? ≈ 27.4 miles - How many vulnerable tracts are more than 15 miles from the nearest hospital 12 tracts
Step 5: Identify Underserved Areas
Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.
Questions to answer: - How many tracts are underserved? 16
What percentage of vulnerable tracts are underserved? 5.76%
Does this surprise you? Why or why not? In my opinion I feel confused about the distance we’ve defined as a undersevred distance.It would be good if we use 10 miles to the defenition of “undersevrved”.But only about 5.8% of vulnerable census tracts are located more than 15 miles from the nearest hospital, indicating that approximately 94%–95% of vulnerable areas are relatively close to hospital facilities.
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population
Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? 1.PERRY 2.CLINTON 3.SULLIVAN 4.BRADFORD 5.ACMERON
Which counties have the most vulnerable people living far from hospitals?
Are there any patterns in where underserved counties are located?
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania
County
# Vulnerable Tracts
# Underserved Tracts
% Underserved
Avg. Distance (mi)
Total Vulnerable Pop.
Underserved Pop.
PERRY
2
2
100.0%
17.53
5,815
5,815
CLINTON
3
2
66.7%
13.84
7,750
4,615
SULLIVAN
3
2
66.7%
18.28
6,949
3,031
BRADFORD
4
2
50.0%
14.14
14,748
7,562
CAMERON
6
3
50.0%
14.09
13,466
6,763
COLUMBIA
2
1
50.0%
9.45
5,897
970
DAUPHIN
2
1
50.0%
10.01
5,838
4,028
JUNIATA
2
1
50.0%
12.56
5,461
1,787
ELK
8
3
37.5%
12.71
19,260
8,045
CLEARFIELD
11
4
36.4%
13.47
36,592
10,359
CENTRE
3
1
33.3%
15.08
11,735
2,167
FOREST
3
1
33.3%
14.09
6,031
2,603
POTTER
6
2
33.3%
10.29
18,675
4,615
BEDFORD
4
1
25.0%
11.45
17,181
4,021
CLARION
8
2
25.0%
12.40
22,104
7,149
FRANKLIN
4
1
25.0%
5.61
14,268
1,787
HUNTINGDON
4
1
25.0%
11.41
12,266
1,787
MIFFLIN
4
1
25.0%
10.28
10,268
1,787
MONROE
4
1
25.0%
12.07
8,388
1,134
CRAWFORD
6
1
16.7%
8.37
16,001
2,661
JEFFERSON
6
1
16.7%
9.40
16,311
4,546
LYCOMING
6
1
16.7%
8.27
21,547
970
TIOGA
6
1
16.7%
10.30
21,484
1,953
VENANGO
6
1
16.7%
10.49
14,597
2,603
MCKEAN
7
1
14.3%
10.04
16,897
2,662
ARMSTRONG
8
1
12.5%
10.74
23,310
4,546
SOMERSET
8
1
12.5%
9.24
24,350
4,021
INDIANA
9
1
11.1%
7.36
31,370
4,546
WARREN
9
1
11.1%
7.01
27,976
2,603
NORTHUMBERLAND
10
1
10.0%
11.66
33,370
4,028
LUZERNE
14
1
7.1%
5.30
35,497
970
ALLEGHENY
47
0
0.0%
2.45
109,061
0
BEAVER
8
0
0.0%
4.34
17,930
0
BERKS
2
0
0.0%
2.95
4,073
0
BLAIR
5
0
0.0%
2.97
14,939
0
BUTLER
3
0
0.0%
8.60
9,545
0
CAMBRIA
14
0
0.0%
6.35
34,583
0
CARBON
3
0
0.0%
8.14
7,325
0
CUMBERLAND
2
0
0.0%
1.64
6,734
0
DELAWARE
7
0
0.0%
1.27
26,294
0
ERIE
7
0
0.0%
2.36
20,784
0
FAYETTE
16
0
0.0%
4.05
50,911
0
FULTON
1
0
0.0%
10.64
3,354
0
LACKAWANNA
10
0
0.0%
4.01
29,208
0
LANCASTER
2
0
0.0%
6.60
8,186
0
LAWRENCE
5
0
0.0%
5.80
12,399
0
LEBANON
2
0
0.0%
3.61
8,456
0
LEHIGH
4
0
0.0%
2.09
11,383
0
MERCER
5
0
0.0%
4.02
16,963
0
MONTGOMERY
3
0
0.0%
1.12
7,081
0
MONTOUR
1
0
0.0%
0.56
4,282
0
NORTHAMPTON
4
0
0.0%
2.41
11,585
0
PHILADELPHIA
21
0
0.0%
1.00
73,982
0
PIKE
2
0
0.0%
14.14
3,903
0
SCHUYLKILL
6
0
0.0%
4.38
17,988
0
SUSQUEHANNA
1
0
0.0%
5.79
1,658
0
UNION
3
0
0.0%
4.15
16,364
0
WASHINGTON
5
0
0.0%
4.66
11,865
0
WAYNE
4
0
0.0%
8.73
9,971
0
WESTMORELAND
22
0
0.0%
4.05
54,779
0
YORK
4
0
0.0%
1.44
11,785
0
Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)
Part 2: Comprehensive Visualization
Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.
Map 1: County-Level Choropleth
Create a choropleth map showing healthcare access challenges at the county level.
Your Task:
library(sf)library(tidyverse)library(ggplot2)library(viridis) library(scales) counties_proj <-st_transform(counties, st_crs(vul_proj))hospitals_proj <-st_transform(hospitals, st_crs(counties_proj))county_map <- counties_proj %>%left_join(county_summary, by =c("COUNTY_NAM"="COUNTY_NAME"))ggplot() +geom_sf(data = county_map,aes(fill = pct_underserved),color ="white", size =0.3) +geom_sf(data = hospitals_proj,shape =21, color ="black", fill ="red", size =2, alpha =0.8)+scale_fill_viridis(name ="% Underserved Vulnerable Tracts",option ="magma",direction =-1,labels =function(x) paste0(round(x, 1), "%") ) +labs(title ="Healthcare Access Challenges in Pennsylvania",subtitle ="Counties colored by percentage of vulnerable tracts that are underserved (>15 miles from nearest hospital)",caption ="Data sources: ACS 2021 (via tidycensus), hospital locations (PA DOH), analysis by Yanyang Chen" ) +theme_void() +theme(legend.position ="right",legend.title =element_text(size =10, face ="bold"),legend.text =element_text(size =8),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, margin =margin(b =8)),plot.caption =element_text(size =8, color ="gray40") )
Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels
Map 2: Detailed Vulnerability Map
Create a map highlighting underserved vulnerable tracts.
Your Task:
# Create detailed tract-level maplibrary(sf)library(tidyverse)library(ggplot2)tracts_proj <-st_transform(vul_proj, st_crs(counties))counties_proj <-st_transform(counties, st_crs(tracts_proj))hospitals_proj <-st_transform(hospitals, st_crs(tracts_proj))ggplot() +geom_sf(data = counties_proj, fill =NA, color ="gray60", size =0.3) +geom_sf(data = tracts_proj, aes(geometry = geometry),fill ="lightgray", color =NA) +geom_sf(data =filter(tracts_proj, underserved ==TRUE),aes(geometry = geometry), fill ="#d73027", color =NA, alpha =0.9) +geom_sf(data = hospitals_proj,shape =21, fill ="yellow", color ="black", size =2, alpha =0.9) +labs(title ="Map 2. Detailed Tract-Level Vulnerability in Pennsylvania",subtitle ="Underserved vulnerable tracts (>15 miles from nearest hospital) shown in red",caption ="Data sources: ACS 2021 via tidycensus; Hospital data: PA Department of Health" ) +theme_void() +theme(plot.title =element_text(size =14, face ="bold", color ="black"),plot.subtitle =element_text(size =10, color ="gray30"),plot.caption =element_text(size =8, color ="gray40"),panel.background =element_rect(fill ="white", color =NA) )
Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create distribution visualizationlibrary(ggplot2)library(scales)# Step: Distribution of distances for vulnerable tracts ----ggplot(vul_proj, aes(x = dist_to_hosp_mi)) +# 直方图部分geom_histogram(aes(y =after_stat(density)), bins =30, fill ="#3182bd", color ="white", alpha =0.7) +# 密度曲线部分geom_density(color ="#de2d26", size =1, alpha =0.6) +# 标题与坐标轴labs(title ="Distribution of Distances to Nearest Hospital",subtitle ="For vulnerable census tracts across Pennsylvania",x ="Distance to Nearest Hospital (miles)",y ="Density",caption ="Data: ACS 2021 (via tidycensus) and PA DOH hospital locations" ) +# 美观格式theme_minimal(base_size =12) +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, color ="gray30"),plot.caption =element_text(size =8, color ="gray40"),panel.grid.minor =element_blank() )
Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size
Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Challenge Options
Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).
Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question
Education & Youth Services
Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting
Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement
Environmental Justice
Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment
Public Safety & Justice
Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies
Infrastructure & Services
Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance
Health & Wellness
Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention
Emergency Services
Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions
Arts & Culture
Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity
Data Sources
OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.
Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries
Recommended Starting Points
If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics
If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations
X Y
Min. :428729 Min. :4406078
1st Qu.:476765 1st Qu.:4422717
Median :485018 Median :4428790
Mean :483162 Mean :4429572
3rd Qu.:489725 3rd Qu.:4435632
Max. :520914 Max. :4464839
# ----------------------------# 3. 统一投票点 CRS# ----------------------------polling_proj <-st_transform(polling_proj, 26918)bus_stops <-st_transform(bus_stops, 26918)philly_boundary <-st_transform(philly_boundary, 26918)# ----------------------------# 4. 计算投票点到最近公交站的距离(米)# ----------------------------nearest_dist <-st_distance(polling_proj, bus_stops)polling_proj$nearest_dist_m <-apply(nearest_dist, 1, min)# ----------------------------# 5. 定义 underserved(>500m 视为交通不可达)# ----------------------------polling_proj$underserved <-ifelse(polling_proj$nearest_dist_m >500, 1, 0)# 计算比例并生成动态文本pct_underserved <-round(mean(polling_proj$underserved) *100, 1)annotation_text <-paste0("Underserved Polling Places: ", pct_underserved, "%")# ----------------------------# 6. 绘图(三层叠加 + 动态比例文字)# ----------------------------ggplot() +# 城市边界底图geom_sf(data = philly_boundary, fill ="grey95", color ="black", linewidth =0.3) +# 公交站geom_sf(data = bus_stops, color ="blue", size =0.3, alpha =0.6) +# 投票点(红=不可达,绿=可达)geom_sf(data = polling_proj,aes(color =as.factor(underserved)),size =1.4, alpha =0.9) +scale_color_manual(values =c("0"="green3", "1"="red3"),labels =c("Served", "Underserved"),name ="Polling Place" ) +coord_sf(xlim =st_bbox(philly_boundary)[c("xmin","xmax")],ylim =st_bbox(philly_boundary)[c("ymin","ymax")],expand =FALSE ) +theme_void() +theme(legend.position ="right",plot.title =element_text(face ="bold", size =15),plot.subtitle =element_text(size =11, color ="grey30"),plot.caption =element_text(size =9, color ="grey40") ) +labs(title ="Polling Place Accessibility and Transit Coverage in Philadelphia",subtitle ="Red = Underserved (>500m from nearest bus stop) | Green = Served | Blue = Bus Stops",caption ="Data: OpenDataPhilly, SEPTA GTFS, Pennsylvania County Boundaries" ) +# ✅ 动态文字注释(右下角)annotate("text",x =st_bbox(philly_boundary)[["xmax"]] -6000,y =st_bbox(philly_boundary)[["ymin"]] +4000,label = annotation_text,hjust =1, vjust =0, color ="grey20",size =4, fontface ="italic")
# A tibble: 2 × 3
status count pct
<chr> <int> <dbl>
1 Served (≤500m) 1700 99.8
2 Underserved (>500m) 3 0.2
# ----------------------------# 2. 绘制可视化柱状图# ----------------------------ggplot(polling_summary, aes(x = status, y = pct, fill = status)) +geom_col(width =0.6) +geom_text(aes(label =paste0(pct, "%")), vjust =-0.3, size =5, fontface ="bold") +scale_fill_manual(values =c("Underserved (>500m)"="red3","Served (≤500m)"="green3")) +theme_minimal(base_size =14) +labs(title ="Polling Place Accessibility Summary in Philadelphia",subtitle ="Proportion of Polling Places within / beyond 500m of a Bus Stop",x =NULL, y ="Percentage of Polling Places",caption ="Data: OpenDataPhilly, SEPTA GTFS" ) +theme(legend.position ="none")
Questions to answer: - What dataset did you choose and why?Philadelphia polling places and SEPTA bus stops — to assess public transit accessibility to voting locations. - What is the data source and date?OpenDataPhilly, SEPTA GTFS feed (2023). - How many features does it contain?~800 polling places, ~12,000 bus stops.
CRS: NAD83 / UTM Zone 18N (EPSG:26918); - What CRS is it in? Did you need to transform it? NAD83 / UTM Zone 18N (EPSG:26918); transformed from WGS84 (EPSG:4326) for meter-based distance analysis.
Pose a research question
Write a clear research statement that your analysis will answer.
Do polling places in Philadelphia have adequate access to public transit, and which areas remain underserved by bus routes within a 500-meter distance?
Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)
Your interpretation:
The spatial analysis reveals that over 99% of polling places in Philadelphia are located within 500 meters of a bus stop, indicating strong transit accessibility to voting locations. Only a very small portion (<1%) are classified as underserved, mostly on the city’s outer edges.
This suggests that Philadelphia’s public transit network effectively supports access to polling sites, minimizing transportation barriers for most residents. Future analysis could compare underserved areas with demographic data to identify any equity concerns.
Finally - A few comments about your incorporation of feedback!
Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.
Incorporation of Feedback
After receiving feedback from the first assignment, I focused on improving spatial data organization and clarity in visualization. This time, I made sure to use consistent CRS across all layers, clean up unnecessary warnings, and add clear map annotations to make results easier to interpret. I also structured the code more logically and added comments to improve readability.
Submission Requirements
What to submit:
Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
Use embed-resources: true in YAML so it’s a single file
All code should run without errors
All maps and charts should display correctly
File naming:LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd
>>>>>>> a5ffa11da9e2bd12c4ee2a7c65ecaee050ab6504
=======
Assignment 2: Spatial Analysis and Visualization
Assignment 2: Spatial Analysis and Visualization
Healthcare Access and Equity in Pennsylvania
Author
Yanyang Chen
Published
October 15, 2025
Assignment Overview
Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences
Part 1: Healthcare Access for Vulnerable Populations
Research Question
Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?
Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.
Required Analysis Steps
Complete the following analysis, documenting each step with code and brief explanations:
Step 1: Data Collection (5 points)
Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -80.27907 ymin: 39.80913 xmax: -75.17005 ymax: 40.24273
Geodetic CRS: WGS 84
CHIEF_EXEC CHIEF_EX_1
1 Peter J Adamo President
2 Autumn DeShields Chief Executive Officer
3 Shawn Parekh Chief Executive Officer
4 DIANE HRITZ Chief Executive Officer
5 Tim Harclerode Chief Executive Officer
6 Richard McLaughlin MD MBA Chief Executive Officer
FACILITY_U LONGITUDE COUNTY
1 https://www.phhealthcare.org -79.91131 Washington
2 https://www.malvernbh.com -75.17005 Philadelphia
3 https://roxboroughmemorial.com -75.20963 Philadelphia
4 https://www.ashospital.net -80.27907 Washington
5 https://www.conemaugh.org -79.02513 Somerset
6 https://towerhealth.org -75.61213 Montgomery
FACILITY_N STREET
1 Penn Highlands Mon Valley 1163 Country Club Road
2 MALVERN BEHAVIORAL HEALTH 1930 South Broad Street Unit 4
3 Roxborough Memorial Hospital 5800 Ridge Avenue
4 ADVANCED SURGICAL HOSPITAL 100 TRICH DRIVE\nSUITE 1
5 DLP Conemaugh Meyersdale Medical Center 200 Hospital Drive
6 Pottstown Hospital, LLC 1600 East High Street
CITY_OR_BO LATITUDE TELEPHONE_ ZIP_CODE geometry
1 Monongahela 40.18193 724-258-1000 15063 POINT (-79.91131 40.18193)
2 Philadelphia 39.92619 610-480-8919 19145 POINT (-75.17005 39.9262)
3 Philadelphia 40.02869 215-483-9900 19128 POINT (-75.20963 40.02869)
4 WASHINGTON 40.15655 7248840710 15301 POINT (-80.27907 40.15655)
5 Meyersdale 39.80913 814-634-5911 15552 POINT (-79.02513 39.80913)
6 Pottstown 40.24273 6103277000 19464 POINT (-75.61213 40.24273)
head(census_tracts)
Simple feature collection with 6 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -78.42478 ymin: 39.79351 xmax: -75.93766 ymax: 40.54328
Geodetic CRS: NAD83
STATEFP COUNTYFP TRACTCE GEOIDFQ GEOID NAME
1 42 001 031101 1400000US42001031101 42001031101 311.01
2 42 013 100400 1400000US42013100400 42013100400 1004
3 42 013 100500 1400000US42013100500 42013100500 1005
4 42 013 100800 1400000US42013100800 42013100800 1008
5 42 013 101900 1400000US42013101900 42013101900 1019
6 42 011 011200 1400000US42011011200 42011011200 112
NAMELSAD STUSPS NAMELSADCO STATE_NAME LSAD ALAND AWATER
1 Census Tract 311.01 PA Adams County Pennsylvania CT 3043185 0
2 Census Tract 1004 PA Blair County Pennsylvania CT 993724 0
3 Census Tract 1005 PA Blair County Pennsylvania CT 1130204 0
4 Census Tract 1008 PA Blair County Pennsylvania CT 996553 0
5 Census Tract 1019 PA Blair County Pennsylvania CT 573726 0
6 Census Tract 112 PA Berks County Pennsylvania CT 1539365 9308
geometry
1 MULTIPOLYGON (((-77.03108 3...
2 MULTIPOLYGON (((-78.42478 4...
3 MULTIPOLYGON (((-78.41661 4...
4 MULTIPOLYGON (((-78.41067 4...
5 MULTIPOLYGON (((-78.40836 4...
6 MULTIPOLYGON (((-75.95433 4...
Questions to answer: - How many hospitals are in your dataset? There are 223 hospital in my dataset. - How many census tracts? 3345 - What coordinate reference system is each dataset in? NAD 83
Step 2: Get Demographic Data
Use tidycensus to download tract-level demographic data for Pennsylvania.
Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)
# Join to tract boundariespa_tracts <-tracts(state ="PA", cb =TRUE)# ✅ 正确写法:去掉右侧的几何列再 joinpa_joined <- pa_tracts %>%left_join(st_drop_geometry(pa_demo_clean), by ="GEOID")
Questions to answer: - What year of ACS data are you using? 2021 - How many tracts have missing income data? 66
- What is the median income across all PA census tracts? 65195.5
Step 3: Define Vulnerable Populations
Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)
Questions to answer: - What income threshold did you choose and why? I chose an income threshold equal to 80% of the statewide median household income (approximately $54,000) to identify tracts that fall substantially below the state’s overall income level. - What elderly population threshold did you choose and why? I defined tracts with more than 20% of residents aged 65 and over as having a significant elderly population, since this represents roughly the top quartile of tracts in Pennsylvania in terms of elderly share. - How many tracts meet your vulnerability criteria? - What percentage of PA census tracts are considered vulnerable by your definition? These tracts account for approximately 6.7% of all Pennsylvania census tracts, according to my criteria.
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection
Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? ≈ 5.8 miles - What is the maximum distance? ≈ 27.4 miles - How many vulnerable tracts are more than 15 miles from the nearest hospital 12 tracts
Step 5: Identify Underserved Areas
Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.
Questions to answer: - How many tracts are underserved? 16
What percentage of vulnerable tracts are underserved? 5.76%
Does this surprise you? Why or why not? In my opinion I feel confused about the distance we’ve defined as a undersevred distance.It would be good if we use 10 miles to the defenition of “undersevrved”.But only about 5.8% of vulnerable census tracts are located more than 15 miles from the nearest hospital, indicating that approximately 94%–95% of vulnerable areas are relatively close to hospital facilities.
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population
Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? 1.PERRY 2.CLINTON 3.SULLIVAN 4.BRADFORD 5.ACMERON
Which counties have the most vulnerable people living far from hospitals?
Are there any patterns in where underserved counties are located?
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
County-Level Summary of Vulnerable and Underserved Census Tracts in Pennsylvania
County
# Vulnerable Tracts
# Underserved Tracts
% Underserved
Avg. Distance (mi)
Total Vulnerable Pop.
Underserved Pop.
PERRY
2
2
100.0%
17.53
5,815
5,815
CLINTON
3
2
66.7%
13.84
7,750
4,615
SULLIVAN
3
2
66.7%
18.28
6,949
3,031
BRADFORD
4
2
50.0%
14.14
14,748
7,562
CAMERON
6
3
50.0%
14.09
13,466
6,763
COLUMBIA
2
1
50.0%
9.45
5,897
970
DAUPHIN
2
1
50.0%
10.01
5,838
4,028
JUNIATA
2
1
50.0%
12.56
5,461
1,787
ELK
8
3
37.5%
12.71
19,260
8,045
CLEARFIELD
11
4
36.4%
13.47
36,592
10,359
CENTRE
3
1
33.3%
15.08
11,735
2,167
FOREST
3
1
33.3%
14.09
6,031
2,603
POTTER
6
2
33.3%
10.29
18,675
4,615
BEDFORD
4
1
25.0%
11.45
17,181
4,021
CLARION
8
2
25.0%
12.40
22,104
7,149
FRANKLIN
4
1
25.0%
5.61
14,268
1,787
HUNTINGDON
4
1
25.0%
11.41
12,266
1,787
MIFFLIN
4
1
25.0%
10.28
10,268
1,787
MONROE
4
1
25.0%
12.07
8,388
1,134
CRAWFORD
6
1
16.7%
8.37
16,001
2,661
JEFFERSON
6
1
16.7%
9.40
16,311
4,546
LYCOMING
6
1
16.7%
8.27
21,547
970
TIOGA
6
1
16.7%
10.30
21,484
1,953
VENANGO
6
1
16.7%
10.49
14,597
2,603
MCKEAN
7
1
14.3%
10.04
16,897
2,662
ARMSTRONG
8
1
12.5%
10.74
23,310
4,546
SOMERSET
8
1
12.5%
9.24
24,350
4,021
INDIANA
9
1
11.1%
7.36
31,370
4,546
WARREN
9
1
11.1%
7.01
27,976
2,603
NORTHUMBERLAND
10
1
10.0%
11.66
33,370
4,028
LUZERNE
14
1
7.1%
5.30
35,497
970
ALLEGHENY
47
0
0.0%
2.45
109,061
0
BEAVER
8
0
0.0%
4.34
17,930
0
BERKS
2
0
0.0%
2.95
4,073
0
BLAIR
5
0
0.0%
2.97
14,939
0
BUTLER
3
0
0.0%
8.60
9,545
0
CAMBRIA
14
0
0.0%
6.35
34,583
0
CARBON
3
0
0.0%
8.14
7,325
0
CUMBERLAND
2
0
0.0%
1.64
6,734
0
DELAWARE
7
0
0.0%
1.27
26,294
0
ERIE
7
0
0.0%
2.36
20,784
0
FAYETTE
16
0
0.0%
4.05
50,911
0
FULTON
1
0
0.0%
10.64
3,354
0
LACKAWANNA
10
0
0.0%
4.01
29,208
0
LANCASTER
2
0
0.0%
6.60
8,186
0
LAWRENCE
5
0
0.0%
5.80
12,399
0
LEBANON
2
0
0.0%
3.61
8,456
0
LEHIGH
4
0
0.0%
2.09
11,383
0
MERCER
5
0
0.0%
4.02
16,963
0
MONTGOMERY
3
0
0.0%
1.12
7,081
0
MONTOUR
1
0
0.0%
0.56
4,282
0
NORTHAMPTON
4
0
0.0%
2.41
11,585
0
PHILADELPHIA
21
0
0.0%
1.00
73,982
0
PIKE
2
0
0.0%
14.14
3,903
0
SCHUYLKILL
6
0
0.0%
4.38
17,988
0
SUSQUEHANNA
1
0
0.0%
5.79
1,658
0
UNION
3
0
0.0%
4.15
16,364
0
WASHINGTON
5
0
0.0%
4.66
11,865
0
WAYNE
4
0
0.0%
8.73
9,971
0
WESTMORELAND
22
0
0.0%
4.05
54,779
0
YORK
4
0
0.0%
1.44
11,785
0
Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)
Part 2: Comprehensive Visualization
Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.
Map 1: County-Level Choropleth
Create a choropleth map showing healthcare access challenges at the county level.
Your Task:
library(sf)library(tidyverse)library(ggplot2)library(viridis) library(scales) counties_proj <-st_transform(counties, st_crs(vul_proj))hospitals_proj <-st_transform(hospitals, st_crs(counties_proj))county_map <- counties_proj %>%left_join(county_summary, by =c("COUNTY_NAM"="COUNTY_NAME"))ggplot() +geom_sf(data = county_map,aes(fill = pct_underserved),color ="white", size =0.3) +geom_sf(data = hospitals_proj,shape =21, color ="black", fill ="red", size =2, alpha =0.8)+scale_fill_viridis(name ="% Underserved Vulnerable Tracts",option ="magma",direction =-1,labels =function(x) paste0(round(x, 1), "%") ) +labs(title ="Healthcare Access Challenges in Pennsylvania",subtitle ="Counties colored by percentage of vulnerable tracts that are underserved (>15 miles from nearest hospital)",caption ="Data sources: ACS 2021 (via tidycensus), hospital locations (PA DOH), analysis by Yanyang Chen" ) +theme_void() +theme(legend.position ="right",legend.title =element_text(size =10, face ="bold"),legend.text =element_text(size =8),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, margin =margin(b =8)),plot.caption =element_text(size =8, color ="gray40") )
Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels
Map 2: Detailed Vulnerability Map
Create a map highlighting underserved vulnerable tracts.
Your Task:
# Create detailed tract-level maplibrary(sf)library(tidyverse)library(ggplot2)tracts_proj <-st_transform(vul_proj, st_crs(counties))counties_proj <-st_transform(counties, st_crs(tracts_proj))hospitals_proj <-st_transform(hospitals, st_crs(tracts_proj))ggplot() +geom_sf(data = counties_proj, fill =NA, color ="gray60", size =0.3) +geom_sf(data = tracts_proj, aes(geometry = geometry),fill ="lightgray", color =NA) +geom_sf(data =filter(tracts_proj, underserved ==TRUE),aes(geometry = geometry), fill ="#d73027", color =NA, alpha =0.9) +geom_sf(data = hospitals_proj,shape =21, fill ="yellow", color ="black", size =2, alpha =0.9) +labs(title ="Map 2. Detailed Tract-Level Vulnerability in Pennsylvania",subtitle ="Underserved vulnerable tracts (>15 miles from nearest hospital) shown in red",caption ="Data sources: ACS 2021 via tidycensus; Hospital data: PA Department of Health" ) +theme_void() +theme(plot.title =element_text(size =14, face ="bold", color ="black"),plot.subtitle =element_text(size =10, color ="gray30"),plot.caption =element_text(size =8, color ="gray40"),panel.background =element_rect(fill ="white", color =NA) )
Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create distribution visualizationlibrary(ggplot2)library(scales)# Step: Distribution of distances for vulnerable tracts ----ggplot(vul_proj, aes(x = dist_to_hosp_mi)) +# 直方图部分geom_histogram(aes(y =after_stat(density)), bins =30, fill ="#3182bd", color ="white", alpha =0.7) +# 密度曲线部分geom_density(color ="#de2d26", size =1, alpha =0.6) +# 标题与坐标轴labs(title ="Distribution of Distances to Nearest Hospital",subtitle ="For vulnerable census tracts across Pennsylvania",x ="Distance to Nearest Hospital (miles)",y ="Density",caption ="Data: ACS 2021 (via tidycensus) and PA DOH hospital locations" ) +# 美观格式theme_minimal(base_size =12) +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, color ="gray30"),plot.caption =element_text(size =8, color ="gray40"),panel.grid.minor =element_blank() )
Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size
Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Challenge Options
Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).
Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question
Education & Youth Services
Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting
Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement
Environmental Justice
Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment
Public Safety & Justice
Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies
Infrastructure & Services
Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance
Health & Wellness
Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention
Emergency Services
Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions
Arts & Culture
Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity
Data Sources
OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.
Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries
Recommended Starting Points
If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics
If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations
X Y
Min. :428729 Min. :4406078
1st Qu.:476765 1st Qu.:4422717
Median :485018 Median :4428790
Mean :483162 Mean :4429572
3rd Qu.:489725 3rd Qu.:4435632
Max. :520914 Max. :4464839
# ----------------------------# 3. 统一投票点 CRS# ----------------------------polling_proj <-st_transform(polling_proj, 26918)bus_stops <-st_transform(bus_stops, 26918)philly_boundary <-st_transform(philly_boundary, 26918)# ----------------------------# 4. 计算投票点到最近公交站的距离(米)# ----------------------------nearest_dist <-st_distance(polling_proj, bus_stops)polling_proj$nearest_dist_m <-apply(nearest_dist, 1, min)# ----------------------------# 5. 定义 underserved(>500m 视为交通不可达)# ----------------------------polling_proj$underserved <-ifelse(polling_proj$nearest_dist_m >500, 1, 0)# 计算比例并生成动态文本pct_underserved <-round(mean(polling_proj$underserved) *100, 1)annotation_text <-paste0("Underserved Polling Places: ", pct_underserved, "%")# ----------------------------# 6. 绘图(三层叠加 + 动态比例文字)# ----------------------------ggplot() +# 城市边界底图geom_sf(data = philly_boundary, fill ="grey95", color ="black", linewidth =0.3) +# 公交站geom_sf(data = bus_stops, color ="blue", size =0.3, alpha =0.6) +# 投票点(红=不可达,绿=可达)geom_sf(data = polling_proj,aes(color =as.factor(underserved)),size =1.4, alpha =0.9) +scale_color_manual(values =c("0"="green3", "1"="red3"),labels =c("Served", "Underserved"),name ="Polling Place" ) +coord_sf(xlim =st_bbox(philly_boundary)[c("xmin","xmax")],ylim =st_bbox(philly_boundary)[c("ymin","ymax")],expand =FALSE ) +theme_void() +theme(legend.position ="right",plot.title =element_text(face ="bold", size =15),plot.subtitle =element_text(size =11, color ="grey30"),plot.caption =element_text(size =9, color ="grey40") ) +labs(title ="Polling Place Accessibility and Transit Coverage in Philadelphia",subtitle ="Red = Underserved (>500m from nearest bus stop) | Green = Served | Blue = Bus Stops",caption ="Data: OpenDataPhilly, SEPTA GTFS, Pennsylvania County Boundaries" ) +# ✅ 动态文字注释(右下角)annotate("text",x =st_bbox(philly_boundary)[["xmax"]] -6000,y =st_bbox(philly_boundary)[["ymin"]] +4000,label = annotation_text,hjust =1, vjust =0, color ="grey20",size =4, fontface ="italic")
# A tibble: 2 × 3
status count pct
<chr> <int> <dbl>
1 Served (≤500m) 1700 99.8
2 Underserved (>500m) 3 0.2
# ----------------------------# 2. 绘制可视化柱状图# ----------------------------ggplot(polling_summary, aes(x = status, y = pct, fill = status)) +geom_col(width =0.6) +geom_text(aes(label =paste0(pct, "%")), vjust =-0.3, size =5, fontface ="bold") +scale_fill_manual(values =c("Underserved (>500m)"="red3","Served (≤500m)"="green3")) +theme_minimal(base_size =14) +labs(title ="Polling Place Accessibility Summary in Philadelphia",subtitle ="Proportion of Polling Places within / beyond 500m of a Bus Stop",x =NULL, y ="Percentage of Polling Places",caption ="Data: OpenDataPhilly, SEPTA GTFS" ) +theme(legend.position ="none")
Questions to answer: - What dataset did you choose and why?Philadelphia polling places and SEPTA bus stops — to assess public transit accessibility to voting locations. - What is the data source and date?OpenDataPhilly, SEPTA GTFS feed (2023). - How many features does it contain?~800 polling places, ~12,000 bus stops.
CRS: NAD83 / UTM Zone 18N (EPSG:26918); - What CRS is it in? Did you need to transform it? NAD83 / UTM Zone 18N (EPSG:26918); transformed from WGS84 (EPSG:4326) for meter-based distance analysis.
Pose a research question
Write a clear research statement that your analysis will answer.
Do polling places in Philadelphia have adequate access to public transit, and which areas remain underserved by bus routes within a 500-meter distance?
Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)
Your interpretation:
The spatial analysis reveals that over 99% of polling places in Philadelphia are located within 500 meters of a bus stop, indicating strong transit accessibility to voting locations. Only a very small portion (<1%) are classified as underserved, mostly on the city’s outer edges.
This suggests that Philadelphia’s public transit network effectively supports access to polling sites, minimizing transportation barriers for most residents. Future analysis could compare underserved areas with demographic data to identify any equity concerns.
Finally - A few comments about your incorporation of feedback!
Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.
Incorporation of Feedback
After receiving feedback from the first assignment, I focused on improving spatial data organization and clarity in visualization. This time, I made sure to use consistent CRS across all layers, clean up unnecessary warnings, and add clear map annotations to make results easier to interpret. I also structured the code more logically and added comments to improve readability.
Submission Requirements
What to submit:
Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
Use embed-resources: true in YAML so it’s a single file
All code should run without errors
All maps and charts should display correctly
File naming:LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd